GitHub Scraper avatar

GitHub Scraper

Pricing

from $1.50 / 1,000 item returneds

Go to Apify Store
GitHub Scraper

GitHub Scraper

Search GitHub repositories or users via the public REST API and get clean, structured rows: stars, forks, issues, language, topics, license, dates for repos; name, bio, company, location, followers for users. No key needed; add a token for higher limits.

Pricing

from $1.50 / 1,000 item returneds

Rating

5.0

(1)

Developer

Dami's Studio

Dami's Studio

Maintained by Community

Actor stats

0

Bookmarked

3

Total users

1

Monthly active users

7 hours ago

Last modified

Share

Search GitHub repositories or users via the public GitHub REST API and get back clean, structured rows. No API key required — but adding a free GitHub token raises your rate limit dramatically (60 → 5000 requests/hr), which matters for larger jobs.

What you get

Repositories (type: repositories): fullName, name, owner, url, description, stars, forks, openIssues, language, topics[], license (SPDX id), homepage, defaultBranch, createdAt, updatedAt, pushedAt.

Users / organizations (type: users) — each result is enriched with profile details: login, url, type, id, name, bio, company, location, blog, followers, publicRepos, createdAt.

Every successful row also carries ok: true. Diagnostic rows (no results, bad input, rate limit, network) carry ok: false plus an errorCode and error message, and are never charged.

Nullable fields: GitHub only returns what a repo/user actually sets, so optional fields are null when absent — e.g. repo description, language, license, homepage; user name, bio, company, location, blog. These nulls are normal and still count as complete rows.

Input

FieldNotes
queryGitHub search syntax. Repos: language:python stars:>1000 machine learning, topic:cli. Users: location:berlin followers:>500.
typerepositories (default) or users.
sortstars (default), forks, updated, or best-match. Applies to repository searches.
maxItemsDefault 100, max 1000 (GitHub's Search API cap).
githubTokenOptional but recommended — raises limits to 5000 req/hr and 30 searches/min. No scopes needed for public data. Kept private.

Output

One dataset row per repository or user, deduplicated by fullName / login. Queries with no matches return a single NO_RESULTS row and are not charged. An empty/missing query returns a single BAD_INPUT row (also not charged) instead of failing the run.

Rate limits

GitHub allows 60 requests/hr unauthenticated (10 searches/min) and 5000/hr with a token (30 searches/min). The Search API also caps at 1000 results per query. If you hit the limit, the actor returns a clear RATE_LIMITED row (with the reset time) suggesting you add a githubToken — it does not silently fail. This applies to user searches too: the per-user profile enrichment step makes one request per result, so a tokenless user search can exhaust the 60/hr budget — if that happens mid-enrichment, the actor surfaces a RATE_LIMITED row rather than returning zero rows silently.

Troubleshooting

  • Got a RATE_LIMITED row — add a free githubToken (60 → 5000 req/hr) and re-run; user searches especially benefit because each result triggers a profile-detail request. The row includes rateLimitResetsAt so you know when to retry.
  • Got a NO_RESULTS row — the query ran but matched nothing; broaden it or check GitHub search qualifiers.
  • Got a BAD_INPUT rowquery was empty; provide a search string.
  • These diagnostic rows have ok: false and are never charged.

Example

{ "query": "language:python stars:>5000 web framework", "type": "repositories", "sort": "stars", "maxItems": 100 }

Notes

To pull more than 1000 results, split the job by a qualifier — e.g. star bands (stars:1000..5000, stars:5001..20000) or creation date windows (created:2022-01-01..2022-12-31).